Clustering is among the most fundamental tasks in computer vision and machinelearning. In this paper, we propose Variational Deep Embedding (VaDE), a novelunsupervised generative clustering approach within the framework of VariationalAuto-Encoder (VAE). Specifically, VaDE models the data generative procedurewith a Gaussian Mixture Model (GMM) and a deep neural network (DNN): 1) the GMMpicks a cluster; 2) from which a latent embedding is generated; 3) then the DNNdecodes the latent embedding into observables. Inference in VaDE is done in avariational way: a different DNN is used to encode observables to latentembeddings, so that the evidence lower bound (ELBO) can be optimized usingStochastic Gradient Variational Bayes (SGVB) estimator and thereparameterization trick. Quantitative comparisons with strong baselines areincluded in this paper, and experimental results show that VaDE significantlyoutperforms the state-of-the-art clustering methods on 4 benchmarks fromvarious modalities. Moreover, by VaDE's generative nature, we show itscapability of generating highly realistic samples for any specified cluster,without using supervised information during training. Lastly, VaDE is aflexible and extensible framework for unsupervised generative clustering, moregeneral mixture models than GMM can be easily plugged in.
展开▼